arXiv:1502.01822v3 [math.NA] 30 Aug 2015 


Solving systems of phaseless equations via Kaczmarz methods: A 

proof of concept study 

Ke Wei* 

September 1, 2015 


Abstract 

We study the Kaczmarz methods for solving systems of phaseless equations, i.e., the general¬ 
ized phase retrieval problem. The methods extend the Kaczmarz methods for solving systems of 
linear equations by integrating a phase selection heuristic in each iteration and overall have the 
same per iteration computational complexity. Extensive empirical performance comparisons 
establish the computational advantages of the Kaczmarz methods over other state-of-the-art 
phase retrieval algorithms both in terms of the number of measurements needed for successful 
recovery and in terms of computation time. Preliminary convergence analysis is presented for 
the randomized Kaczmarz methods. 

Keywords. Generalized phase retrieval, Kaczmarz methods, alternating projection, phase 
selection heuristic. 
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1 Introduction 

1.1 The problem of phase retrieval 

In general, phase retrieval is about recovering a vector from the magnitude measurements, or 
equivalently solving a system of phaseless equations: 

Vr = \{ar,x)\^ , r = 1, • • • ,m, (1) 

where x G C” and a,. G C”. Let A G be a matrix whose rows are {a*}i<r<m and y = 

(yi,-’’ The set of quadratic equations in Q can be formulated as yy = \Ax\, where 

\/y := and \Ax\ := (| (ai, x)| , • • • ,\{am., x)\)'^. Despite its simple form, phase 

retrieval arises in a wide range of practical context such as X-ray crystallography [T], diffraction 
imaging [2] and microscopy [3], where the detector cannot measure the phase of the optical wave 
directly but only its magnitudes. 

Let X be a solution to the phase retrieval problem. Apparently, xe*® is also a solution for any 
9 G [0, 27r). Therefore the uniqueness of the solution to 0 can only be defined up to a global phase 
factor. It has been shown that 2n — 1 generic magnitude measurements suffice to determine a unique 
solution to Q if the measurement vectors {ar}i<r<m and decision variables x are real-valued [3], 
while 4n — 4 measurements are sufficient [5] for the complex measurements and variables. 
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1.2 Existing methods for phase retrieval 

In this section, we review some of the existing methods for phase retrieval. The classical algorithms 
for phase retrieval are Error Reduction (ER, also known as Gerchberg-Saxton) and its variants 
which were pioneered by Gerchberg and Saxton [6] and Eienup mu- Error Reduction is an 
alternating projection algorithm between the image and signal spaces, see Alg. In each iteration, 
the current estimate is first projected in the image space so that the magnitudes of its image match 
the measurements and then the new estimate is obtained by a least squares fitting. The first two 
steps of Alg. can also be interpreted as a phase selection heuristic which approximates the image 
phase of the solution by that of the current estimate. As suggested by its name, ER satisfies the 


Algorithm 1 Error Reduction (ER) 

Initilization: xq 
for / = 0,1, • • • do 

1. 9i = ZAxi, where the angle is computed for each entry of Axi 

2. zi = ^ Q © is the elementwise multiplication 

3. = argmin \\Ax — zi \\2 = A^zi 

X 

end for 


residual reduction property |||Ax;+i| — v ^|||2 < ||l^3;/| — y ^||2 which follows straightforward from 
the short calculation: 


\\\Axi+i \-^\\2 = 


< 

< 


|Axz+i| - 1^0 ^ 

Axij^i - Vy 0 

2 

Axi- ,/yQ 

2 

\Axi\ - Vy0e*®' 

\\\^xi\ -^/y\\2, 


( 2 ) 


where the first inequality follows from the triangle inequality and the second inequality follows 
from the definition of xi^i- When A corresponds to Fourier transform, the last two steps in Alg. 
essentially adjust the phase of the measurements and then seek a signal which fits the magnitude 
measurements. Moreover, if the signal is known to belong to a convex set C, the new estimate can 
be obtained by further projecting A^zi onto C as 

xz+i = Pc (^Ahi'j . 

Let A be a linear map from n x n Hermitian matrices to m dimensional vectors defined as 


A{X)r = a*Xar, r = 1, • • • , m. 

Then the magnitude measurements in Q can be lifted up as the linear measurements with respect 
to the rank one matrices of the form X = xx*: 


yr = \a*x\^ = {a*x){x* ttr) = a*Xar = A{X)r. 
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Based on this observation, finding the solution to 0 is equivalent to a rank minimization problem 

min rank(X) 

s.t. A{X) = y (3) 

X ^ 0 

by noting that the optimal solution to (§ coincides with xx* when x is a solution to 0. However, 
rank minimization is a non-convex optimization problem. Inspired by recent work on matrix com¬ 
pletion, Candes et al. [9] propose an approach called PhaseLift by replacing the rank functional in 
Q with the matrix trace norm and instead solving a semidefinite programming 

min trace(X) 

s.t. A{X) = y (4) 

X ^ 0. 

For certain random models, it can be established that Q and Q share the same optimal solution 
if the number of measurements is (nearly) proportional to the size of the signal |10l [TTl I12j . In 
|13| . a different semidefinite programming called PhaseCut is proposed by splitting the phase and 
magnitude variables. 

Very recently, a line search algorithm called Wirtinger Flow see Alg.[^ has been developed 
by applying gradient descent iterations to the loss function 

fi^) = ^\\\M‘^-y\\l- (5) 

In each iteration of Wirtinger Flow, the current estimator xi is updated along the gradient descent 
direction —Vf{xi) with the stepsize yi. Although f{x) is not a convex function globally]^ it has 

Algorithm 2 Wirtinger Flow 
Initilization: xq, yi 
for / = 0,1, • • • do 

1. V/(xz) = ((|Axip -y)Q (Axi)) 

2. x^+i = xi- 7|-^V/(xz) 

11^0112 

end for 


been proven in m that for certain random measurement vectors, with high probability /(x) is 
a strongly convex function in a neighbourhood of x when the number of measurements is nearly 
proportional to the length of the measured vector. Therefore exponential convergence (or linear 
convergence) can be established for Wirtinger Flow if the initial point xq is selected to be in this 
small neighbourhood of x. 

Moreover, there have been many algorithms designed especially for compressive phase retrieval 
problems in which the signals are known to be sparse, see [Ems] and references therein. 

^When Ur (r = 1, ■ • ■ ,m) are real-valued vectors, f{x) is a degree-four polynomial which is generally non-convex, 
for example ~ 1)^, a; G R. 
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1.3 Kaczmarz methods for linear equations 


The simple Kaczmarz method also known as the Algebraic Reconstruction Technique (ART) 
jl8| . is an iterative projection algorithm which was initially designed for solving a system of linear 
equations Ax = y. In the Rh iteration, if the rth row of A is selected, then the new estimate is 
obtained by projecting current estimate xi onto the hyperplane determined by the linear equation 
(or, x) = yr as 


Xl+i =Xl + 


Ur 1 ^l) 


The deterministic version of the simple Kaczmarz method usually sweeps through the rows of A in 
a cyclic manner. The corresponding convergence results that appear in the literature are all based 
on the quantities of A that are hard to compute and involve the expressions which do not have 
a clear geometric meaning [HEniEI]. Strohmer and Vershynin [22j provide the first exponential 
convergence analysis for a randomized version of the simple Kaczmarz method in terms of the 
scaled condition number of A defined as 


k{A) = ||A||,.||ylt||2, 


where denotes the Moore-Penrose pseudo-inverse of A. In each iteration, the randomized Kacz¬ 
marz method randomly picks up a row of A with the probability proportional to the square norms 
of the rows, that is 


P {the rth row is selected} 



r = 1, • • • , m. 


( 6 ) 


In the block Kaczmarz method |23j . instead of selecting a single row of A in each iteration, 
a subset of rows are selected, denoted by Ap. Then the current estimate xi is updated by being 
projected onto the solution space of A-px = br, 


xi+i =xi + aI {yr - Arxi). 


The exponential convergence of the randomized block Kaczmarz method can be similarly established 
provided the partition of the measurement matrix is known, see |24] . 

When the linear system is inconsistent, exponential convergence to a neighbourhood of the 
desired solution is established for the randomized simple and block Kaczmarz methods in |25) and 
|24j . In order to apply the Kaczmarz methods to solve the least squares problems, the extended 
Kaczmarz methods are designed to simultaneously decrease the system inconsistency by orthog¬ 
onal column projections and update the estimate by the Kaczmarz methods |26l [27] . Additional 
references for Kaczmarz method include |28 | (2^ [30] , In the Kaczmarz methods, the residual does 
not decrease monotonically. In each iteration, while the residual of the selected rows reduces to 
zero, the residual of the other rows will increase. 

1.4 Main contributions and outline 

The main contributions of this manuscript are: 

1. We introduce a family of Kaczmarz methods to find the solutions to Eq. Q . The new methods 
are developed by integrating a phase selection heuristic into the classical Kaczmarz methods 
for linear equations. 
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2. Extensive numerical experiments demonstrate the feasibility of applying the Kaczmarz meth¬ 
ods for linear equations to solve the generalized phase retrieval problems, and establish the 
computational advantages of the Kaczmarz methods over prior art. 

The rest of this manuscript is organised as follows. In section we derive and discuss the 
properties of the Kaczmarz methods for phase retrieval. In section we present detailed numerical 
comparisons of the Kaczmarz methods with ER and Wirtinger Flow. The preliminary convergence 
analysis for the randomized Kaczmarz methods is presented in section Section concludes this 
manuscript with some future research directions. 

2 Kaczmarz methods for phase retrieval 

2.1 The simple Kaczmarz method 


Algorithm 3 Simple Kaczmarz 


Initilization: xq 
for / = 0,1, • • • do 

1. select a row of A, denoted by a*, 

2 . 0i — Z. {cLr, Xi) 

o I vlTe*®*-(driXi) 

3 . Xl+1 = Xl + ^ 2 - -Or 

end for 


either in a deterministic manner or randomly 


We first present the simple Kaczmarz method for the phase retrieval problem 0- see Alg. i In 
each iteration of the simple Kaczmarz method, it firstly selects a row of the measurement matrix 
either in a deterministic (e.g., cyclic) manner or randomly, and then projects the current estimate 
xi onto the hyperplane 

|x : (or, x) = I with 0/= Z (a^, x^) G [0, 27r), 

where the image phase of the solution is approximated by that of the current estimate. The selection 
of 6i can also be interpreted in another way. Suppose in the /th iteration, we pick up an arbitrary 
9 G [0, 27r), and then project xi onto the hyperplane 

|x : (ar,x) = • 

Similarly the projection is given by 


xf+i 


- {ar,xi) 

~t“ II 112 • 


Among all the candidates of it can be easily verified that x^+i is the one that minimizes the 
distance between xfj^^ and x;, or equivalentlj0 


6i = argmin 
e 


xf+i - XI 


^Note when {ar,Xi) 
the literature. 


0, any 6 € [0, 27r) minimizes ||xf+i — 


in which case we will set = 0 as is typical in 
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Therefore is indeed the projection of xi onto the set of hyperplanes 

{x\ \{ar,x)\=^}, 

and Alg. can be viewed as a non-convex Kaczmarz method. 

2.2 The block Kaczmarz method 


Algorithm 4 Block Kaczmarz 

Initilization: xq, partition T = {Ti, • • • , TA^i} of the row indices {1, • • • , m} 

for / = 0,1, • • • , do 

1. select a block from T either in a deterministic manner or randomly 

2. 9i = ZAr^xi, where the angle is computed for each entry of Ar^xi 

3. xi+i =xi + a[^ {y/m^ O e*®' - Ar^xi) 

end for 


The block Kaczmarz method for phase retrieval (Alg. begins with a partition of the measure¬ 
ment matrix into a number of blocks. In each iteration, it firstly selects a block, denoted by 
either deterministically or randomly and then projects the current estimate xi onto the intersections 
of the hyperplanes determined by 


|x : Ar^x = yr^ © , with 6i = ZAr^xi. 

The pseudo-inverse in the third step of Alg. returns a solution of minimum £2 norm to an 
underdetermined least squares problem. The block Kaczmarz method for phase retrieval has the 
following property. 


Proposition 2.1. Let A-p^ and Ap be two block submatrices of A and Ar^ur = 


be their 


concatenation. Assume rank(Arjurj) = Ihil -|- iTjl < n and A^.Ap. = 0. Then applying two 
iterations of the block Kaczmarz method to the blocks Tj and Tj one after another is equivalent to 
applying one iteration of the block Kaczmarz method to the block Tj U Tj. 


Proof. Let xq G C” be an arbitrary point. Two iterations of the block Kaczmarz method applied 
to the blocks Tj and Tj gives 


xi= xq + Ajj, (^./yp~ 0 - Ap^xo^ 

X 2 = xi + aI^ (^y/yr ~0 - Ap.xi'^ . 


One iteration of the block Kaczmarz method to the block Tj U Tj gives 

X12 = xo + Af,,ur. 0 - Ar^ur^a^o) . 

^In the block Kaczmarz method, we assume that the row submatrix Ar^ is fat (i.e., Trl < n) following the 
literature of the block Kaczmarz method for linear equations; though without this assumption ER can be viewed as 
a special instance of the block Kaczmarz method with only one block. 
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From the condition = 0, we have 


= ^ri^rj(^r,-4r^) = 0 and Ar^A}., = ^r^.Ap^(^r.ArJ = 0, 


which leads to 

Therefore 




i- i J- 1 


xi2 = a;o + (v^yr^ 0 - ^r.ur.a^o 


= ico + 




/-/-s iZ-Ay' xq 

.JWjQe 0 


AviXo 

At.xq 


xo + Ap, (Vyr\ 0 e*-^^r0o _ yip.xo 
yp,. 0 - Ap^ xo) 


+^p 


= xi + Ajl ( O - ApjXo 


iZAp.xo 


= xi + A^ l 0 - Ar^xi) = X2, 


iZAr -xi 


where the second to last equality follows from the fact ^p xo = ^p xi since ^dp-^t = 0. 


□ 


Proposition (2.1) implies that if the submatrix ^dp^ consists of jr^l orthogonal rows, then one 
iteration of the block Kaczmarz method is equivalent to iFj-l iterations of the simple Kaczmarz 
method. In the Kaczmarz methods for linear equations, it is trivial that successive projections 
onto a set of orthogonal affine spaces returns the projection onto the intersection of the spaces. 
Proposition (2.1) confirms that this property still holds when applying the Kaczmarz methods to 
solve systems of quadratic equations, without being influenced by the phase selection heuristic. 

A natural question arises following the discussion in section 2.1 If IP^I > 1, whether x;_|_i is the 
projection of xi onto the non-convex set 


{x : |Ap^x| = y/yiz}^ 


(7) 


Since for an arbitrary phase factor e*^, the projection of xi onto the set {x ; Ap^x = is 

given by x®^_^ = x; + (^yp^ 0 — Ap^x;), this question can be reformulated as whether 


min 

6 


{V^ ® “ Ar^xi'^ 


( 8 ) 


attains its minimum at 0i. Unfortunately the answer is negative unless Ap has some special 
structures, such as Ap is a column orthonormal matrix in which case we have 


At 

i r 


yvr 0 e*® - Ap^x; 


0 - Ay,XI 


A simple counterexample in is given below to support this argument. 
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Example 2.2. Let 


= 

i r 


2 1 
1 0 


, Ar^xi = 


Then e*®* = sign(Ar^xz) = [l and the objective function of Q evaluated at e*®* is ^/5, which 

is clearly larger than 1 when e*® = [l ~l]^- 

Let u = e*®. Then ([8|) can be further reformulated as a non-convex optimization problem 

A'r iVy^ 0 ^ 


mm 


s.t. \u\ = 1. 


So in general computing the solution to Q is not tractable. However, the following theorem bounds 
the deviation of from the optimal projection of xi onto the feasible set Q in terms of the 
condition number of the submatrix. 

Theorem 2.3 (Projection Error Analysis). Let 6 be the minimum of Q and xf^^ be the corre¬ 
sponding projection of xi. The difference between xf-^ and xi+i can be bounded as 






diag ( e*® - e*®' 


I2 ) 


(9) 


where ^(Ar ) = ) (j^g^otes the condition number o/Ar , and x is a solution to the phase 

<^min / 

retrieval problem g. 

Proof. The result follows from the calculation: 


xf+i - Xl+l 


a: 


r. 




< 


^r.diag \Ar,x 

diag - e*®' 
diag (e^^ - e*®' 


A 


II^hI 




I2 > 


where the second equality follows from the fact |Ar,,x| = y/Wf diag — e*®'j denotes a 
diagonal matrix with the diagonal entries being the vector — e*®'. □ 

2.3 Convergence resnlts for the randomized Kaczmarz methods 

Let X G C” be a solution to the phase retrieval problem Q. For any x G C”, the distance of x to 
X is defined as 


dist(x,x)= min 
6 »e[ 0 , 27 r) 


X — xe 


id 


As previously stated in section 1.3, the randomized Kaczmarz methods for linear equations have 
been well-studied recently and the corresponding convergence results only depends on the quantities 
of the matrix which are widely used in numerical linear algebra. In this section, we will present 
preliminary convergence analysis of the randomized Kaczmarz methods for phase retrieval. For 
ease of exposition, we assume the measurement matrix A is standardized, that is each row of A has 
unit (.2 norm. In addition, we assume the rows or submatrices are selected uniformly at random in 
each iteration. 


















































Theorem 2.4. Let A G ([^'r^xn ^ standardized matrix with full column rank and y = \Ax\^ for 

some X € C^. For an arbitrary initial estimate xq, the iterates xi {I > 0) produced by the randomized 
simple Kaczmarz method satisfy 


E [dist^(x;, x)] < ( 1 — 


a. 


mm 

m 


(A) 


dist^(xo, x) + 


a. 


Am 


( 10 ) 


Basically, theorem 2.4 states that for any given initial point, the exponential convergence of the 
randomized simple Kacmarz method can be guaranteed until the iterate reaches a neighbourhood 
of X. However, the size of the neighbourhood is quite pessimistic. For example, if H is a matrix 
whose rows are independent spherical random vectors, a ‘^-^^ {A) will be proportional to m/n [3T] 
and the size of the neighbourhood is proportional to the size of the measured vector. The proof of 
theorem 12.41 will be deferred to section |4l 

To state a similar result for the randomized block Kaczmarz method, we need the following 
definition introduced in 


Definition 2.5. An {Nf,, a, (5) row paving of a matrix A is a partition T = {Fi, • • • , FjVi,} of the 
row indices that satisfies 

a < o-^in(^rJ < < fi for each F^ G T. 


With this definition, we have the following property for the randomized block Kaczmarz method. 

Theorem 2.6. Let A G be a standardized matrix with full column rank and y = \Ax\^ for 

some X G C”. For an arbitrary initial estimate xq, the iterates xi {I > 0) produced by the randomized 
block Kaczmarz method satisfy 


E [dist^(x«,x)] ^ (^1 - dist^(xo,x) 


+ 


Afi 


acr„ 


M) 


( 11 ) 


The proof of theorem 2.6 follows the proof of theorem 2.4 and is omitted for brevity. In |24] . the 
authors review several approaches of constructing good pavings such that the condition numbers 
of all the submatrices are uniformly small. From theorem 2.3 we can see that the paving of A also 
has an effect on the approximations of the optimal projections. 


3 Numerical experiments 

In this section, we present the empirical observations of the Kaczmarz methods as compared with 
ER (Alg. and Wirtinger Flow (Alg. [^. ER and Wirtinger Flow are selected due to their 
applicability for signals without assuming any a priori knowledge, flexibility in terms of the types of 
measurements, and reported efficiency as greedy algorithms. We only present the empirical results 
of the deterministic Kaczmarz methods which go through the rows or partitions of the measurement 
matrix cyclically. The randomized Kaczmarz methods which select the rows or submatrices with 
the probability proportional to their sizes exhibit similar performance in our test cases. All the 
tested algorithms are implemented in Matlab with the code for Wirtinger Flow downloaded from 
the author’s website. The tests are conducted on a desktop computer with 4-core 3.2GHz processors 
and 8GB memor}]^ 

^The codes and data to reproduce all the figures and tables in the manuscript can be downloaded from 
https://github.com/makwei/phase-kacz. 
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3.1 Implementation details 

3.1.1 Termination conditions 


ER and Wirtinger Flow will be terminated after they reach a maximum number of iterations or 
the relative residual is small, that is 


\\^xi\‘^ - y\ 

II2/II2 


< ei 


( 12 ) 


for some ei > 0. For the Kaczmarz methods, we define a cycle as the number of iterations the 
algorithms take to touch each row of the measurement matrix once. So a cycle consists of m 
iterations for the simple Kaczmarz method and Nb iterations for the block Kaczmarz method. The 
Kaczmarz methods will be terminated after they reach a maximum number of cycles. To verify the 
relative residual criterion (12) in each iteration is somewhat expensive for the Kaczmarz methods as 
it requires a matrix-vector product involving the full matrix A. The typical approach is to evaluate 
it after a fixed number of cycles. However, in our implementations, we will check whether 

'Mr.Xip - 7/rJI 


max 


I \ 


Vrl 


< €2 or max 


< 62 


(13) 


\yr\ IlyrJIa 

is satisfied or not after each cycle for some 62 > 0 , where the maximum is taken over a cycle of 
iterations. Note that xi is the iterate before the projection, so \a*xi\‘^ / yr and / yr^- 


3.1.2 Solving the least squares subproblems 

For the unstructured measurement matrix, one can apply either direct methods or iterative methods 
to solve the overdetermined least squares subprobblems in ER and the underdetermined least 
squares subproblems in the block Kaczmarz methods, depending on the sizes and conditions of the 
matrices or submatrices. In our tests for the Gaussian measurement matrix, we use Matlab peg with 
warm starting to solve the overdetermined least squares subproblems in ER and Matlab backslash 
to solve the underdetermined least squares subproblems in the block Kaczmarz methods. 


3.2 Random experiments 
3.2.1 ID set-up 

The measured vectors x are drawn from the Gaussian distribution, that is x ~ N'{0,ln) when 
X G M"" and x ~ AA(0, In) + iM{0, In) when x G C”. The algorithms are tested with three different 
measurement models: 

• the Gaussian model with entries of A drawn i.i.d from either J\f{0, 1) for real signals or 
M{0, 1/2) -I- iAf{0, 1 / 2 ) for complex signals. 


• the unitary model with A being the concatenation of unitary matrices: 


Qi 


A = 




where Qi, £=l ,---,[^]—1 are unitary matrices and is a row submatrix of a unitary 

matrix. We generate each Qi from the QR factorization of a random Gaussian matrix. 
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• the coded diffraction model for complex signals with 


Ax 


(F{di © x) 
\F{dL 0 x) 


(14) 


where F denotes the n x n DFT matrix and di G C”, i = 1, • • • ,L are a series of coded 
diffraction patterns (GDPs). In this paper, we will consider the octonary pattern suggested 
in [121 E!) but note that other patterns are also possible, see na M- In the octonary 
pattern, every entry of di is a random variable of the form bib 2 , where bi takes the value in 
{ 1 , —l,i, —i} with equal probability and 62 takes the value of V2/2 or with probability 
4/5 and 1/5 respectively. 


For the coded diffraction model, the pseudo-inverse of A in Alg. is given by 


At 




^iF*ze)Qide/^\d,(^ 


\ZL, 


e=i 


£=1 


where Z£ G C”, i = 1, ■ ■ ■ , L. In the block Kaczmarz method, we will assume that all the rows of 
the submatrix Ap correspond to the same coded diffraction pattern, that is for any x G 


Ayx = Fr{di 0 x) for some 1 < £ < L. 
Consequently for any 2 ; G Cl'"/ the pseudo-inverse of Ap is given by 


At .2 = - 0 (F^z). 
de 

In particular, if A is partitioned into L equal blocks with each block corresponding to a coded 
diffraction pattern, the estimate update in the block Kaczmarz method (Step 3 of Alg. can be 
simplified to 

xi+i = ^ ® iF*xi), £ = 1, - ■ ■ ,L. 

The resulted block Kaczmarz method can be viewed as applying the ER update to each coded 
diffraction pattern sequentially, which is also known as the iterative transform algorithm in the 
optics community [321133j . though motivated from a different perspective here. However, the resid¬ 
ual reduction property stated in Q does not hold in general, see the remark about the Kaczmarz 
methods for linear equations at the end of section 1.3 


3.2.2 Successful recovery rate 


In the ID simulations, we set n = 256. For the Gaussian and unitary models, the algorithms are 
tested for 20 values of m selected according to m = round((i • n) with 6 taking 20 equally spaced 
values from 2 to 6 . For the coded diffraction model, the number of coded diffraction patterns varies 
from 2 to 12 . In this subsection, we will use the initial point suggested in [T3] for all the tested 
algorithms, i.e., we set 


xo = 



(15) 
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where z is the unit leading eigenvector of Ylr=iyr(^rO-r- It has been proved in [T3] that for the 
Gaussian and coded diffraction models xq can be arbitrarily close to x if m is the order of nlogn. 
Starting from the same initial point, ER and Wirtinger Flow are terminated after 2500 iterations 
are reached or the relative residual |||Ax;p — y\\ 2 / \\y \\2 is below 10“®, while the Kaczmarz methods 
are terminated after they have run 500 cycles or the maximum relative residual defined in (13) is 
below 10“^. For every pair of (n, m), 50 random tests are conducted. We consider a vector to be 
successfully recovered if the algorithm returns a vector x/ which has a small relative error, that is 
when 


rel.err ; = 


dist(x;, x) 


< 10 


-5 


We test the block Kaczmarz method with four different block sizes 


r^l G {n/8, n/4, n/2, n} = {32,64,128, 256}. 


The rows of the measurement matrix are partitioned into equal blocks (except the last block) with 
each block containing jF^I consecutive rows. The empirical probabilities of successful recoveries for 
different tested models and algorithms are plotted in figure We can make several observations 
out of the plots. 

For the Gaussian real model (figure (a)), we observe that the recovery probability curves of 
the simple Kaczmarz method and the block Kaczmarz method with block sizes 32, 64 and 128 are 
close to each other. They can successfully recover 85% of the test signals when m = 2.5n. There 
are several instances of successful recovery even when m = 2n. Note that there is a unique solution 
to 0 in real case only if m > 2n — 1 [Ij. In contrast, ER and Wirtinger Flow respectively need 4.5n 
and 5n number of measurements for successful recovery with high probability. For the complex case 
(figure (b)), the simple Kaczmarz method, the block Kaczmarz method with block sizes 32,64 
and ER have similar recovery probability curves which are slightly superior to Wirtinger Flow. 
They can successfully recover the test signals with high probability when m ~ 4n while the block 
Kaczmarz method with block size 128 needs m = 5n number of measurements. 

However, the block Kaczmarz method with block size n = 256 completely fails in both the real 


and complex cases. Here is a potential explanation for its failure. Example 2.2 indicates that x/+i 


is generally not the projection of x; onto the non-convex set 0 but only a heuristic approximation. 
Furthermore, theorem 2.3 shows that the deviation of x^+i from the optimal projection relies heavily 
on the condition numbers of the submatrices. In the block Kaczmarz method, is a IT^I x n 
matrix. Applying the Bai-Yin Law |34j heuristically shows that 




1 + y^jr^l/re 

1 - y/\Tr.\/n 


(16) 


Therefore when IT^I = n/8, 4/n, and n/2, the corresponding conditions number of Ap^ are ap¬ 
proximately 2, 3, and 6. However, when |r,.| = n, the condition number of Ap^ can be very large 
and hence x;+i cannot be a good approximation of the optimal projection of x; anymore even for 
very good approximations of the phase factor. 

Due to proposition |2.1| only the block Kaczmarz method with each block corresponding to a 
unitary matrix Qi is tested for the unitary model. The Kaczmarz method can successful recover 
the real signals with high probability when m ~ 3n, while ER needs 4.5n number of measurements 
(figure (c)). For the complex reconstruction problem (figure [^(d)), the recovery probability curve 
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Unitary, real 



Unitary, complex 
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CDP (ID), complex 
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(f) 


Figure 1: Empirical probability of successful recovery out of 50 random trials; ID: n = 256, 2D: 
m X n 2 = 256 X 256. In the coded diffraction model m = Ln or m = Lnin 2 with L only taking 
integral values. 
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of the Kaczmarz method is higher than that of ER when m < An. Wirtinger Flow does not work 
for the unitary model in neither the real nor the complex cases. 

For the ID coded diffraction model (figure [^(e)), the simplest Kaczmarz method has the largest 
probability of successful recovery when L = 3. All the tested algorithms can recover the signals 
successfully with probability greater than 90% when L = 4, including the block Kaczmarz method 
with block size n = 256. Notice that the recovery probability curves for the block Kaczmarz 
method with different block sizes are almost indistinguishable. In the coded diffraction model, 
the condition number of is less than \/6 for any IF^I < n. We also conduct tests for the 2D 
coded diffraction model for which the vectors x G sampled from Gaussian distribution 

AA(0,1,ijxn2) + *-^(0,/„jxn 2 )- The set-up for the 2D coded diffraction model is similar to the 
ID case, but with the ID DFT replaced by the 2D DFT. For the Kaczmarz methods, only the 
block variant with each block corresponding to a coded diffraction is tested because for the coded 
diffraction model, the block Kaczmarz method can be potentially much faster than the simple 
Kaczmarz method as it can take advantage of the FFT. In the 2D tests, we set ni = n 2 = 256. 
Figure (f) shows that the block Kaczmarz method is still able to successfully recover all the 
test signals when L = 4, which is independent of the dimensionality of the signals. However, ER 
and Wirtinger Flow respectively need 6n and lOn number of measurements to achieve the high 
probability recovery, compared with the 4n number of measurements needed for the ID case. 


3.2.3 Sensitivity to initial points 


For non-convex programming, the initialization procedure is very important and a good initial point 
can prevent the convergence to a local minimal solution. It has been reported in the literature that 
the performance of ER depends heavily on the initial points, see |35 (ll31fT4] . In this section, we will 
investigate the performance of the Kaczmarz methods when the initial points are generated ran¬ 
domly according to the standard Gaussian distribution. Figure [^compares the recovery probability 
curves of ER and the Kaczmarz methods under the random and spectral initialization. In general. 


the Kaczmarz methods with the spectral initialization (15) have higher recovery curves than the 


random initialization when the number of measurements is small. However, it only requires at most 
0.5n more number of measurements for the Kaczmarz methods with the random initialization to 
successfully recover all the test signals. In particular, the recovery curves of the block Kaczmarz 
methods for the coded diffraction model are nearly indistinguishable for the two different initializa¬ 
tion methods. In contrast, ER with the random initialization requires at least 2n more number of 
measurements to achieve the same recovery probability as the one with the spectral initialization. 
The randomly initialized ER fails all the tests for the real Gaussian and unitary measurements. 
For the Gaussian measurement model, it is also worth noting that the block Kaczmarz method is 
less sensitive to the initial points than the simple Kaczmarz method. 

Although the Kaczmarz methods are relatively less sensitive to the initial points than ER, 
starting from the initial points provided by the spectral method can reduce the computation time 
of all the tested algorithms. Therefore without further mention, we will use the initial point xq in 


(15) for all the tested algorithms in the remainder of this section. 


®For the other two models, 2D tests are essentially the same as ID tests after vectorizing the signals and mea¬ 
surement matrices. 
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Figure 2: Empirical probability of successful recovery out of 50 random trials for random and 
eigenvector initializatons; ID: n = 256, 2D: n\ x n 2 = 256 x 256. In the coded diffraction model 
m = Ln or m = Lnin 2 with L only taking integral values. 
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3.2.4 Computation time 


Next, we evaluate the computation time of the tested algorithms when solving the random problems 
to high accuracy. We conduct the tests for the Gaussian and coded diffraction models with a 
sufficient number of measurements so that all the tested algorithms can succeed in recovering all of 
the ten randomly generated signals. The number of measurements m is set to 6n for the Gaussian 
models and the coded diffraction ID model and m is set to 12nin2 for the coded diffraction 2D 
model. The maximum number of iterations (cycles) is set to 2500 (500) as in section 3.2.2, We 
set ei = 10“^*^ for ER and Wirtinger Flow, e 2 = 10“^ for the simple Kaczmarz method, and 
€2 = 10“® for the block Kaczmarz method. The average number of iterations or cycles and the 
average computation time are listed in table First, it can be observed that it takes a similar 
number of cycles for the simple and block Kaczmarz methods to achieve similar accuracy. However, 
the block Kaczmarz method is much faster in computation time as it can take advantage of the 
BLAS2 subroutines rather than BLASl for the Gaussian models and the fast Fourier transform for 
the coded diffraction models. For the Gaussian complex model and the coded diffraction models, 
the block Kaczmarz method are dramatically faster than ER and Wirtinger Flow because of its 
fast convergence rate. 


Table 1: Computational results of ER, Wirtinger Flow and the Kaczmarz methods for solving the 
problems to high accuracy. In the block Kaczmarz method, the block size is selected to be n/4 for 
the Gaussian model, and respectively n and ni x n 2 for the coded diffraction ID and 2D models. 




ER 


Wirtinger Flow 

#its 

time(s) 

rel.err 

#its 

time(s) 

rel.err 

Gaussian Real 

7 

0.032 

8.68e-13 

286 

0.886 

1.9e-10 

Gaussian Complex 

144 

1.067 

2.0e-10 

911 

7.011 

3.0e-10 

GDP ID 

157 

0.054 

1.99e-10 

909 

0.301 

2.94e-10 

GDP 2D 

149 

13.443 

1.3e-10 

315 

22.796 

1.91e-10 


simple Kaczmarz 

block Kaczmarz 


#cycles 

time(s) 

rel.err 

#cycles 

time(s) 

rel.err 

Gaussian Real 

10 

0.469 

4.81e-13 

8 

0.036 

6.81e-12 

Gaussian Complex 

24 

1.643 

1.14e-10 

26 

0.274 

3.04e-10 

GDP ID 

24 

1.927 

7.3e-ll 

25 

0.009 

3.13e-10 

GDP 2D 

- 

- 

- 

15 

1.478 

2.41e-ll 


3.2.5 Robustness to additive noise 


With the same number of measurements as in section 3.2.4 we further explore the performance of 
the algorithms under noisy measurements of the form 


y = max(|Ax|^ + e || |^xp ||2 • e, 0), 


(17) 


where e > 0 denotes the noise level and e G is uniformly distributed on the unit sphere. We 
test the algorithms with nine different noise levels from 10“® to 0.1. The plots of the relative errors 
against the noise levels are shown in figure for the Gaussian and coded diffraction models. The 
plots show clearly the desirable linear scaling between the noise levels and the relative errors for 
all the tested algorithms and tested models. 
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Figure 3: Log-log plots of relative errors vs noise levels; ID n = 256, 2D: ni = n 2 = 256. 
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3.3 Performance on molecules and natural images 




(a) 


(b) 


Figure 4: (a) projection of the 3D Caffeine molecule’s density map along z-axis, size: 128 x 128; 
(b) photograph of the Milky Way Galaxy, size 1080 x 1920. 

In this subsection, we evaluate the performance of the Kaczmarz methods on real images and 
molecules for the coded diffraction model. Based on the empirical observations in section 3.2 


apparently the block Kaczmarz method with each block corresponding to a coded diffraction pattern 
is highly recommended for this task compared to the other variants. So we only test this variant 
compared with ER and Wirtinger Flow, using the same stopping criteria as in section 3.2.4 The 


algorithms are tested for fonr different nnmbers of coded diffraction patterns L = 4, 8,12,16. For 
a fixed L, we repeat the tests ten times. 

We hrst run the algorithms on the projection of the 3D Caffeine molecule’s density map onto 
the xy-plane along the z-axis. Figure (a) makes a plot of this projection, which is a 128 x 128 
matrix. For the details on the computation of the projection, we refer the reader to [14t 136]. Next 
we test the algorithms on the Milky Way Galaxy photograph of size 1080 x 1920, see hgure|^ (b). 
Since it is an RGB photograph, we rnn the algorithms on each R, G, B channels independently. 
The computational results for both the projection of the molecule’s density map and the Milky 
Way Galaxy are listed in table For the Galaxy, we consider a random test to be successful if 
the algorithm can snccessfully reconstruct all the three R, G, B images for the generated coded 
diffraction patterns. The number of iterations (cycles) and the computation time for the Galaxy 
are average results over all the successful recoveries and the three R, G, B channels. 

First the table shows that L = 4 is sufficient for the block Kaczmarz method to snccessfully 
reconstruct both the molecule’s density map projection and the Galaxy from the magnitnde mea- 
snrements, which coincides with our observations in the random simnlations, see hgure[^ (f). For 
both ER and Wirtinger Flow, reconstructing the sophisticated Galaxy requires more numbers of 
the coded diffraction patterns than reconstructing the molecule’s density map projection. Regard¬ 
ing to the computation time, the block Kaczmarz method is overall ten times faster than ER and 
twenty times faster than Wirtinger Flow. 


4 Proof of theorem 2.4 


The proof of theorem 2.4 uses a result from [25], which is stated below. 
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Table 2: Computational results of ER, Wirtinger Flow and the block Kaczmarz method for recon¬ 
structing the 3D Caffeine molecule’s density projection and the RGB image Milky Way Galaxy to 
high accuracy. Four different numbers of coded diffraction patterns L = 4, 8,12,16 are tested. 


Caffeine Galaxy 




L = 4 

L = 8 

L = 12 

L = 16 

L = A 

L = 8 

L = 12 

L = 16 


#succ 

0 

6 

10 

10 

0 

2 

2 

10 

TTR 

rel.err 

- 

1.6e-10 

1.4e-10 

1.3e-10 

- 

1.7e-10 

1.3e-10 

1.2e-10 


#its 

- 

227 

95 

69 

- 

904 

353 

154 


time(s) 

- 

3.26 

2.11 

2.0 

- 

2038 

858.3 

686.2 


#succ 

0 

3 

10 

10 

0 

0 

4 

10 

Wirtinger 

rel.err 

- 

l.le-10 

2.0e-10 

1.7e-10 

- 

- 

1.9e-10 

1.7e-10 

Flow 

#its 

- 

1057 

279 

233 

- 

- 

467 

277 


time(s) 

- 

12.4 

4.8 

5.4 

- 

- 

1362.8 

1063.2 


#succ 

10 

10 

10 

10 

10 

10 

10 

10 

block 

rel.err 

7.8e-10 

l.le-10 

2.0e-ll 

9.4e-12 

7.2e-10 

1.3e-10 

3.4e-ll 

1.2e-ll 

Kaczmarz 

#cycles 

169 

23 

12 

8 

296 

28 

19 

13 


time(s) 

1.4 

0.4 

0.33 

0.3 

356 

69.7 

69.8 

65.3 


Lemma 4.1. Let Ax = y be a linear system with A G being standardized. Let xi-i be any 

vector in C” and xi be the random projection of x;_i computed from the randomized Kaczmarz 
method for the ineonsistent linear system Ax = y + e. Let x he the solution to Ax = y. Then we 
have 


E 


\Xl - X\\2 


< 1 - 


m 


\xi-i - x\\l -I- R^, 


where R = maxi<r<m 


Proof of theorem 2-4 ■ Denote by the estimate obtained when the rth row is selected, that is 


r , y/yfe" - {ar,Xl-l) 

Xi = Xi-i H-- —-K - ar, 


i'r\\2 


where = Z (a^, x;_i) , r = 1, • • • , m. Then at the (Z — l)th iteration one has E {xi = 
Expressed differently, xi can be viewed as the solution obtained by applying one iteration of the 
randomized Kaczmarz method for the inconsistent linear system 


Ax = y/yQe^^‘-^ where = 




e ^-1 


(18) 


In order to apply lemma 4.1 we will form a consistent linear system in each iteration as follows. 
Define fi-i = argmin^gjQ 27 r) = Z Then the linear system 




Ax = y/y Q where = 

is consistent and the solution to (19) is given by Rewrite (18) as 

Ax = Vy o + Viz 0 




(19) 
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Then applying lemma 4.1 gives 


E 

xi — xe*"^'"* 

2' 


xi-i — xe*'^*“^ 



2 

\ m J 





where 


max 

l<r<m 


Vih (e 


M-i _ 


< 2 max := R, 

l<r<m 


and the expectation is conditioned on xi-i. Since for every 1 < r < m, dist(x[, x) < ||x[ — II 2 , 

we have 


E [dist^(x;, x)] < E 


xi — xe 




m 


xi-i — xe 


i’t'i-i 


+ R^ 


= 1 - 


^mln(^)^ dist^(x;_i, x) + R^, 


m 


where the last equality follows from the definition of dist(-, •). Taking full expectation on both sides 
and applying the resulted relationship repeatedly yields 


E [dist2(xi,x)] < (^1-dist2(xo,x) + ^ [ 1 - 


i-i 


< 1 1 - ] dist2(xo, X) + 


m 


k=o ^ 
mR? 


which concludes the proof by reintroducing the value of R. 


□ 


5 Conclusion and future directions 


Phase retrieval is an important topic which has received intensive investigations recently. This 
manuscript develops the Kaczmarz methods for the generalized phase retrieval problem. The 
empirical results demonstrate that the Kaczmarz methods are superior to ER and Wirtinger Flow 
in terms of the successful recovery probabilities and overall computation time. For the block 
Kaczmarz method, we show that the condition numbers of the submatrices play a key role in 
successful recovery in theorem 2.3 To the best of our knowledge, this is the first paper which 


suggests applying the Kaczmarz methods to solve the systems of phaseless equations. 

A central question that remains to be answered is how many measurements are needed for the 
Kaczmarz methods to successfully find the solution to the generalized phase retrieval problem. For 
the real case when the signals and measurement vectors are all real-valued, the Kaczmarz methods 
for phase retrieval can reduce to the Kaczmarz methods for linear equations if the initial point 
is close enough to the true solution x because of the separation of x and —x. So in principle, 
0(n log n) number of measurements are sufficient for the Gaussian real model, see the remark after 
However, this is not true for the complex measurements because xe*® is continuous with 


respect to 0 G [0, 27r). For the randomized Kaczmarz methods, notice that the proof of theorem 2.4 
does not use any information provided by the phase selection heuristic. A recovery guarantee is 
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Figure 5: Convergence rate comparison of the simple Kaczmarz method with and without relaxation 
for the Gaussian complex model with n = 256 and m = 4n. The relaxation parameter A; is selected 
to be 1 + n/m in each iteration. Averaged over 50 random tests. 


likely to require an analysis of how well the phase selection heuristic approximates the phase of the 
true solution in each iteration. 

This manuscript opens a door of applying the existing techniques in the accelerated Kaczmarz 
methods for linear equations to the phase retrieval problem, for example by introdncing relaxation. 
With the relaxation, the update rule in the simple Kaczmarz method (Alg. becomes 


xi+i = xi + Xi 


- {ar,xi) 


dr- . 


It is snggested in [22] that the randomized simple Kaczmarz method for linear systems can be 
accelerated for Gaussian measurement matrices if relaxation parameter is set to A; = 1 + n/m in 
each iteration. The numerical simulations show that the same selection of the relaxation parameter 
can also accelerate the simple Kaczmarz method for phase retrieval, see figure The Kaczmarz 
methods for phase retrieval are designed by extending one of the existing linear solvers to solve the 
system of qnadratic eqnations. Other typical linear solvers snch as conjugate gradient method may 
well be similarly effective. 

As demonstrated in the literature mi, ER often works much better if a priori knowledge about 
the signals is incorporated, such as real-valued, nonnegative and sparsity. Despite the already very 
good performance of the Kaczmarz methods for our test problems, it is worth investigating whether 
their performance can be fnrther improved by exploring the strnctnres of the signals. Finally, it 
may also be possible to extend the Kaczmarz methods to other related problems, such as blind 
deconvolution m and self-calibration |38j . 
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